Overview

Dataset statistics

Number of variables15
Number of observations5530
Missing cells5933
Missing cells (%)7.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory648.2 KiB
Average record size in memory120.0 B

Variable types

Categorical6
Numeric9

Warnings

CUST_ID has a high cardinality: 5530 distinct values High cardinality
CASH_ADVANCE has a high cardinality: 2609 distinct values High cardinality
PURCHASES_TRX has a high cardinality: 80 distinct values High cardinality
MINIMUM_PAYMENTS has a high cardinality: 5441 distinct values High cardinality
PURCHASES is highly correlated with ONEOFF_PURCHASES_FREQUENCYHigh correlation
ONEOFF_PURCHASES_FREQUENCY is highly correlated with PURCHASESHigh correlation
BALANCE is highly correlated with CASH_ADVANCE_TRX and 1 other fieldsHigh correlation
PURCHASES is highly correlated with PURCHASES_FREQUENCY and 1 other fieldsHigh correlation
CASH_ADVANCE_TRX is highly correlated with BALANCE and 1 other fieldsHigh correlation
PURCHASES_FREQUENCY is highly correlated with PURCHASES and 1 other fieldsHigh correlation
ONEOFF_PURCHASES_FREQUENCY is highly correlated with PURCHASESHigh correlation
CASH_ADVANCE_FREQUENCY is highly correlated with BALANCE and 2 other fieldsHigh correlation
PURCHASES is highly correlated with PURCHASES_FREQUENCYHigh correlation
CASH_ADVANCE_TRX is highly correlated with CASH_ADVANCE_FREQUENCYHigh correlation
PURCHASES_FREQUENCY is highly correlated with PURCHASESHigh correlation
CASH_ADVANCE_FREQUENCY is highly correlated with CASH_ADVANCE_TRXHigh correlation
PURCHASES is highly correlated with PURCHASES_TRX and 1 other fieldsHigh correlation
PURCHASES_TRX is highly correlated with PURCHASES and 1 other fieldsHigh correlation
CREDIT_LIMIT is highly correlated with BALANCEHigh correlation
ONEOFF_PURCHASES_FREQUENCY is highly correlated with PURCHASES_TRXHigh correlation
BALANCE is highly correlated with CREDIT_LIMITHigh correlation
PAYMENTS is highly correlated with PURCHASESHigh correlation
GENDER has 2714 (49.1%) missing values Missing
CASH_ADVANCE_TRX has 150 (2.7%) missing values Missing
ONEOFF_PURCHASES_FREQUENCY has 2740 (49.5%) missing values Missing
CASH_ADVANCE_FREQUENCY has 166 (3.0%) missing values Missing
TENURE has 163 (2.9%) missing values Missing
CUST_ID is uniformly distributed Uniform
CUST_ID has unique values Unique
PAYMENTS has unique values Unique
PURCHASES has 1393 (25.2%) zeros Zeros
CASH_ADVANCE_TRX has 2812 (50.8%) zeros Zeros
PURCHASES_FREQUENCY has 1392 (25.2%) zeros Zeros
ONEOFF_PURCHASES_FREQUENCY has 1464 (26.5%) zeros Zeros
CASH_ADVANCE_FREQUENCY has 2801 (50.7%) zeros Zeros

Reproduction

Analysis started2022-03-06 19:10:41.020455
Analysis finished2022-03-06 19:10:53.729491
Duration12.71 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

CUST_ID
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE

Distinct5530
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size43.3 KiB
C14071
 
1
C14465
 
1
C17245
 
1
C15632
 
1
C13761
 
1
Other values (5525)
5525 

Length

Max length6
Median length6
Mean length6
Min length6

Characters and Unicode

Total characters33180
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5530 ?
Unique (%)100.0%

Sample

1st rowC12529
2nd rowC14138
3rd rowC15409
4th rowC18141
5th rowC15879

Common Values

ValueCountFrequency (%)
C140711
 
< 0.1%
C144651
 
< 0.1%
C172451
 
< 0.1%
C156321
 
< 0.1%
C137611
 
< 0.1%
C143361
 
< 0.1%
C155411
 
< 0.1%
C161121
 
< 0.1%
C171801
 
< 0.1%
C111501
 
< 0.1%
Other values (5520)5520
99.8%

Length

2022-03-06T14:10:53.938136image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
c129411
 
< 0.1%
c114351
 
< 0.1%
c173411
 
< 0.1%
c151781
 
< 0.1%
c178141
 
< 0.1%
c136381
 
< 0.1%
c169281
 
< 0.1%
c107941
 
< 0.1%
c177771
 
< 0.1%
c108001
 
< 0.1%
Other values (5520)5520
99.8%

Most occurring characters

ValueCountFrequency (%)
17787
23.5%
C5530
16.7%
82324
 
7.0%
72286
 
6.9%
62276
 
6.9%
42265
 
6.8%
52244
 
6.8%
22231
 
6.7%
32222
 
6.7%
02213
 
6.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number27650
83.3%
Uppercase Letter5530
 
16.7%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
17787
28.2%
82324
 
8.4%
72286
 
8.3%
62276
 
8.2%
42265
 
8.2%
52244
 
8.1%
22231
 
8.1%
32222
 
8.0%
02213
 
8.0%
91802
 
6.5%
Uppercase Letter
ValueCountFrequency (%)
C5530
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common27650
83.3%
Latin5530
 
16.7%

Most frequent character per script

Common
ValueCountFrequency (%)
17787
28.2%
82324
 
8.4%
72286
 
8.3%
62276
 
8.2%
42265
 
8.2%
52244
 
8.1%
22231
 
8.1%
32222
 
8.0%
02213
 
8.0%
91802
 
6.5%
Latin
ValueCountFrequency (%)
C5530
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII33180
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
17787
23.5%
C5530
16.7%
82324
 
7.0%
72286
 
6.9%
62276
 
6.9%
42265
 
6.8%
52244
 
6.8%
22231
 
6.7%
32222
 
6.7%
02213
 
6.7%

GENDER
Categorical

MISSING

Distinct2
Distinct (%)0.1%
Missing2714
Missing (%)49.1%
Memory size43.3 KiB
F
1443 
M
1373 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters2816
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowF
2nd rowF
3rd rowF
4th rowF
5th rowM

Common Values

ValueCountFrequency (%)
F1443
26.1%
M1373
24.8%
(Missing)2714
49.1%

Length

2022-03-06T14:10:54.121773image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-03-06T14:10:54.174486image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
f1443
51.2%
m1373
48.8%

Most occurring characters

ValueCountFrequency (%)
F1443
51.2%
M1373
48.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter2816
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
F1443
51.2%
M1373
48.8%

Most occurring scripts

ValueCountFrequency (%)
Latin2816
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
F1443
51.2%
M1373
48.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII2816
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
F1443
51.2%
M1373
48.8%

BALANCE
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION

Distinct5525
Distinct (%)99.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1041.700463
Minimum-4587.892398
Maximum7390.19856
Zeros6
Zeros (%)0.1%
Negative165
Negative (%)3.0%
Memory size43.3 KiB
2022-03-06T14:10:54.245474image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum-4587.892398
5-th percentile3.88240525
Q174.060304
median632.7436345
Q31545.808455
95-th percentile3869.371332
Maximum7390.19856
Range11978.09096
Interquartile range (IQR)1471.748151

Descriptive statistics

Standard deviation1353.093044
Coefficient of variation (CV)1.29892718
Kurtosis3.290218207
Mean1041.700463
Median Absolute Deviation (MAD)594.745598
Skewness1.475458824
Sum5760603.559
Variance1830860.785
MonotonicityNot monotonic
2022-03-06T14:10:54.372096image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
06
 
0.1%
1132.6153151
 
< 0.1%
1.1556091
 
< 0.1%
911.2818621
 
< 0.1%
28.4861241
 
< 0.1%
1839.930461
 
< 0.1%
1886.8112821
 
< 0.1%
15.7288941
 
< 0.1%
1155.3388241
 
< 0.1%
25.9506641
 
< 0.1%
Other values (5515)5515
99.7%
ValueCountFrequency (%)
-4587.8923981
< 0.1%
-4530.6390941
< 0.1%
-4251.4116171
< 0.1%
-4071.9937641
< 0.1%
-3948.7768841
< 0.1%
-3876.7783021
< 0.1%
-3699.6946911
< 0.1%
-3474.9726121
< 0.1%
-3433.2959731
< 0.1%
-3207.6053671
< 0.1%
ValueCountFrequency (%)
7390.198561
< 0.1%
7347.3559671
< 0.1%
7293.1087941
< 0.1%
7215.7450961
< 0.1%
7152.8643721
< 0.1%
7005.3106961
< 0.1%
6980.2284441
< 0.1%
6958.2399741
< 0.1%
6950.5830491
< 0.1%
6943.4337751
< 0.1%

PURCHASES
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct3682
Distinct (%)66.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean534.5771031
Minimum0
Maximum9661.37
Zeros1393
Zeros (%)25.2%
Negative0
Negative (%)0.0%
Memory size43.3 KiB
2022-03-06T14:10:54.487750image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median269.13
Q3723.7
95-th percentile1975.906
Maximum9661.37
Range9661.37
Interquartile range (IQR)723.7

Descriptive statistics

Standard deviation773.4887449
Coefficient of variation (CV)1.446917087
Kurtosis18.5878817
Mean534.5771031
Median Absolute Deviation (MAD)269.13
Skewness3.268794177
Sum2956211.38
Variance598284.8385
MonotonicityNot monotonic
2022-03-06T14:10:54.606668image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01393
 
25.2%
45.6521
 
0.4%
15014
 
0.3%
6012
 
0.2%
45010
 
0.2%
10010
 
0.2%
509
 
0.2%
2509
 
0.2%
6009
 
0.2%
1209
 
0.2%
Other values (3672)4034
72.9%
ValueCountFrequency (%)
01393
25.2%
0.013
 
0.1%
0.051
 
< 0.1%
0.241
 
< 0.1%
12
 
< 0.1%
4.81
 
< 0.1%
4.991
 
< 0.1%
6.91
 
< 0.1%
7.261
 
< 0.1%
8.43
 
0.1%
ValueCountFrequency (%)
9661.371
< 0.1%
8945.671
< 0.1%
8834.961
< 0.1%
8591.311
< 0.1%
7311.991
< 0.1%
65201
< 0.1%
6398.731
< 0.1%
5855.461
< 0.1%
5812.171
< 0.1%
5788.811
< 0.1%

BALANCE_FREQUENCY
Real number (ℝ≥0)

Distinct58
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean26.48255227
Minimum0
Maximum1000
Zeros6
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size43.3 KiB
2022-03-06T14:10:54.741194image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.363636
Q10.833333
median1
Q31
95-th percentile1
Maximum1000
Range1000
Interquartile range (IQR)0.166667

Descriptive statistics

Standard deviation152.899316
Coefficient of variation (CV)5.773586866
Kurtosis34.06665053
Mean26.48255227
Median Absolute Deviation (MAD)0
Skewness5.96293972
Sum146448.5141
Variance23378.20083
MonotonicityNot monotonic
2022-03-06T14:10:54.867679image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
13554
64.3%
0.909091275
 
5.0%
0.818182188
 
3.4%
0.545455158
 
2.9%
0.636364147
 
2.7%
0.727273145
 
2.6%
0.454545135
 
2.4%
0.363636125
 
2.3%
1000111
 
2.0%
0.272727110
 
2.0%
Other values (48)582
 
10.5%
ValueCountFrequency (%)
06
 
0.1%
0.09090923
 
0.4%
0.11
 
< 0.1%
0.1252
 
< 0.1%
0.1428571
 
< 0.1%
0.1666671
 
< 0.1%
0.18181889
1.6%
0.25
 
0.1%
0.2222222
 
< 0.1%
0.254
 
0.1%
ValueCountFrequency (%)
1000111
2.0%
909.0919
 
0.2%
888.8891
 
< 0.1%
857.1432
 
< 0.1%
833.3331
 
< 0.1%
818.1827
 
0.1%
727.2733
 
0.1%
636.3646
 
0.1%
545.4554
 
0.1%
454.5453
 
0.1%

CASH_ADVANCE
Categorical

HIGH CARDINALITY

Distinct2609
Distinct (%)47.2%
Missing0
Missing (%)0.0%
Memory size43.3 KiB
0.0
2808 
??
 
75
0.0?ñ
 
41
823.979128
 
1
181.735735
 
1
Other values (2604)
2604 

Length

Max length13
Median length3
Mean length6.456057866
Min length2

Characters and Unicode

Total characters35702
Distinct characters13
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2606 ?
Unique (%)47.1%

Sample

1st row472.818286
2nd row642.862505
3rd row0.0
4th row0.0
5th row2183.782456

Common Values

ValueCountFrequency (%)
0.02808
50.8%
??75
 
1.4%
0.0?ñ41
 
0.7%
823.9791281
 
< 0.1%
181.7357351
 
< 0.1%
110.7954881
 
< 0.1%
233.2462671
 
< 0.1%
148.5424191
 
< 0.1%
1116.1284661
 
< 0.1%
56.6446541
 
< 0.1%
Other values (2599)2599
47.0%

Length

2022-03-06T14:10:55.104542image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
0.02808
50.8%
75
 
1.4%
0.0?ñ41
 
0.7%
181.7357351
 
< 0.1%
2327.5669081
 
< 0.1%
110.7954881
 
< 0.1%
233.2462671
 
< 0.1%
148.5424191
 
< 0.1%
1116.1284661
 
< 0.1%
56.6446541
 
< 0.1%
Other values (2599)2599
47.0%

Most occurring characters

ValueCountFrequency (%)
07537
21.1%
.5455
15.3%
13119
8.7%
22707
 
7.6%
32485
 
7.0%
42431
 
6.8%
92430
 
6.8%
82375
 
6.7%
52313
 
6.5%
72303
 
6.5%
Other values (3)2547
 
7.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number29929
83.8%
Other Punctuation5689
 
15.9%
Lowercase Letter84
 
0.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
07537
25.2%
13119
10.4%
22707
 
9.0%
32485
 
8.3%
42431
 
8.1%
92430
 
8.1%
82375
 
7.9%
52313
 
7.7%
72303
 
7.7%
62229
 
7.4%
Other Punctuation
ValueCountFrequency (%)
.5455
95.9%
?234
 
4.1%
Lowercase Letter
ValueCountFrequency (%)
ñ84
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common35618
99.8%
Latin84
 
0.2%

Most frequent character per script

Common
ValueCountFrequency (%)
07537
21.2%
.5455
15.3%
13119
8.8%
22707
 
7.6%
32485
 
7.0%
42431
 
6.8%
92430
 
6.8%
82375
 
6.7%
52313
 
6.5%
72303
 
6.5%
Other values (2)2463
 
6.9%
Latin
ValueCountFrequency (%)
ñ84
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII35618
99.8%
Latin 1 Sup84
 
0.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
07537
21.2%
.5455
15.3%
13119
8.8%
22707
 
7.6%
32485
 
7.0%
42431
 
6.8%
92430
 
6.8%
82375
 
6.7%
52313
 
6.5%
72303
 
6.5%
Other values (2)2463
 
6.9%
Latin 1 Sup
ValueCountFrequency (%)
ñ84
100.0%

CASH_ADVANCE_TRX
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
MISSING
ZEROS

Distinct34
Distinct (%)0.6%
Missing150
Missing (%)2.7%
Infinite0
Infinite (%)0.0%
Mean49.11542751
Minimum0
Maximum18000
Zeros2812
Zeros (%)50.8%
Negative0
Negative (%)0.0%
Memory size43.3 KiB
2022-03-06T14:10:55.205234image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q33
95-th percentile12
Maximum18000
Range18000
Interquartile range (IQR)3

Descriptive statistics

Standard deviation573.8177709
Coefficient of variation (CV)11.68304543
Kurtosis469.4166907
Mean49.11542751
Median Absolute Deviation (MAD)0
Skewness19.33841254
Sum264241
Variance329266.8342
MonotonicityNot monotonic
2022-03-06T14:10:55.537756image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=34)
ValueCountFrequency (%)
02812
50.8%
1562
 
10.2%
2393
 
7.1%
3290
 
5.2%
4234
 
4.2%
5204
 
3.7%
6159
 
2.9%
7130
 
2.4%
8105
 
1.9%
1083
 
1.5%
Other values (24)408
 
7.4%
(Missing)150
 
2.7%
ValueCountFrequency (%)
02812
50.8%
1562
 
10.2%
2393
 
7.1%
3290
 
5.2%
4234
 
4.2%
5204
 
3.7%
6159
 
2.9%
7130
 
2.4%
8105
 
1.9%
964
 
1.2%
ValueCountFrequency (%)
180001
 
< 0.1%
170001
 
< 0.1%
140001
 
< 0.1%
120001
 
< 0.1%
100001
 
< 0.1%
80002
 
< 0.1%
70001
 
< 0.1%
60003
0.1%
50007
0.1%
40005
0.1%

PURCHASES_FREQUENCY
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct69
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12.20600598
Minimum0
Maximum1000
Zeros1392
Zeros (%)25.2%
Negative0
Negative (%)0.0%
Memory size43.3 KiB
2022-03-06T14:10:55.654289image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0.363636
Q30.833333
95-th percentile1
Maximum1000
Range1000
Interquartile range (IQR)0.833333

Descriptive statistics

Standard deviation93.75767056
Coefficient of variation (CV)7.681273525
Kurtosis82.01112325
Mean12.20600598
Median Absolute Deviation (MAD)0.363636
Skewness8.892601479
Sum67499.21305
Variance8790.500789
MonotonicityNot monotonic
2022-03-06T14:10:55.776121image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01392
25.2%
1881
15.9%
0.083333465
 
8.4%
0.5277
 
5.0%
0.166667274
 
5.0%
0.25237
 
4.3%
0.333333233
 
4.2%
0.833333230
 
4.2%
0.416667216
 
3.9%
0.666667211
 
3.8%
Other values (59)1114
20.1%
ValueCountFrequency (%)
01392
25.2%
0.083333465
 
8.4%
0.09090935
 
0.6%
0.118
 
0.3%
0.11111112
 
0.2%
0.12520
 
0.4%
0.14285717
 
0.3%
0.166667274
 
5.0%
0.18181811
 
0.2%
0.215
 
0.3%
ValueCountFrequency (%)
100026
0.5%
916.6676
 
0.1%
9001
 
< 0.1%
857.1431
 
< 0.1%
833.3334
 
0.1%
818.1821
 
< 0.1%
7505
 
0.1%
714.2861
 
< 0.1%
7002
 
< 0.1%
666.6671
 
< 0.1%

PURCHASES_TRX
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct80
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Memory size43.3 KiB
0
1353 
1
460 
12
387 
2
 
252
6
 
243
Other values (75)
2835 

Length

Max length7
Median length1
Mean length1.45045208
Min length1

Characters and Unicode

Total characters8021
Distinct characters12
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique13 ?
Unique (%)0.2%

Sample

1st row2
2nd row0
3rd row12
4th row14
5th row1

Common Values

ValueCountFrequency (%)
01353
24.5%
1460
 
8.3%
12387
 
7.0%
2252
 
4.6%
6243
 
4.4%
4203
 
3.7%
3196
 
3.5%
5190
 
3.4%
8186
 
3.4%
7184
 
3.3%
Other values (70)1876
33.9%

Length

2022-03-06T14:10:56.069508image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
01353
24.5%
1460
 
8.3%
12387
 
7.0%
2252
 
4.6%
6243
 
4.4%
4203
 
3.7%
3196
 
3.5%
5190
 
3.4%
8186
 
3.4%
7184
 
3.3%
Other values (70)1876
33.9%

Most occurring characters

ValueCountFrequency (%)
12014
25.1%
01989
24.8%
21285
16.0%
3410
 
5.1%
4408
 
5.1%
6393
 
4.9%
7330
 
4.1%
5328
 
4.1%
8299
 
3.7%
9271
 
3.4%
Other values (2)294
 
3.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number7727
96.3%
Other Punctuation212
 
2.6%
Lowercase Letter82
 
1.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
12014
26.1%
01989
25.7%
21285
16.6%
3410
 
5.3%
4408
 
5.3%
6393
 
5.1%
7330
 
4.3%
5328
 
4.2%
8299
 
3.9%
9271
 
3.5%
Other Punctuation
ValueCountFrequency (%)
?212
100.0%
Lowercase Letter
ValueCountFrequency (%)
ñ82
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common7939
99.0%
Latin82
 
1.0%

Most frequent character per script

Common
ValueCountFrequency (%)
12014
25.4%
01989
25.1%
21285
16.2%
3410
 
5.2%
4408
 
5.1%
6393
 
5.0%
7330
 
4.2%
5328
 
4.1%
8299
 
3.8%
9271
 
3.4%
Latin
ValueCountFrequency (%)
ñ82
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII7939
99.0%
Latin 1 Sup82
 
1.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
12014
25.4%
01989
25.1%
21285
16.2%
3410
 
5.2%
4408
 
5.1%
6393
 
5.0%
7330
 
4.2%
5328
 
4.1%
8299
 
3.8%
9271
 
3.4%
Latin 1 Sup
ValueCountFrequency (%)
ñ82
100.0%

ONEOFF_PURCHASES_FREQUENCY
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING
ZEROS

Distinct41
Distinct (%)1.5%
Missing2740
Missing (%)49.5%
Infinite0
Infinite (%)0.0%
Mean0.1482977523
Minimum0
Maximum1
Zeros1464
Zeros (%)26.5%
Negative0
Negative (%)0.0%
Memory size43.3 KiB
2022-03-06T14:10:56.195750image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30.166667
95-th percentile0.75
Maximum1
Range1
Interquartile range (IQR)0.166667

Descriptive statistics

Standard deviation0.241687055
Coefficient of variation (CV)1.629741862
Kurtosis3.442475174
Mean0.1482977523
Median Absolute Deviation (MAD)0
Skewness2.013350785
Sum413.750729
Variance0.05841263257
MonotonicityNot monotonic
2022-03-06T14:10:56.329598image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=41)
ValueCountFrequency (%)
01464
26.5%
0.083333376
 
6.8%
0.166667214
 
3.9%
0.25131
 
2.4%
0.33333388
 
1.6%
0.41666783
 
1.5%
164
 
1.2%
0.561
 
1.1%
0.58333342
 
0.8%
0.66666738
 
0.7%
Other values (31)229
 
4.1%
(Missing)2740
49.5%
ValueCountFrequency (%)
01464
26.5%
0.083333376
 
6.8%
0.09090923
 
0.4%
0.113
 
0.2%
0.11111111
 
0.2%
0.12511
 
0.2%
0.14285714
 
0.3%
0.166667214
 
3.9%
0.18181814
 
0.3%
0.212
 
0.2%
ValueCountFrequency (%)
164
1.2%
0.91666728
0.5%
0.9090911
 
< 0.1%
0.8751
 
< 0.1%
0.83333321
 
0.4%
0.8181821
 
< 0.1%
0.7536
0.7%
0.7272731
 
< 0.1%
0.7142862
 
< 0.1%
0.74
 
0.1%

CASH_ADVANCE_FREQUENCY
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
MISSING
ZEROS

Distinct46
Distinct (%)0.9%
Missing166
Missing (%)3.0%
Infinite0
Infinite (%)0.0%
Mean0.1190054092
Minimum0
Maximum1.5
Zeros2801
Zeros (%)50.7%
Negative0
Negative (%)0.0%
Memory size43.3 KiB
2022-03-06T14:10:56.464205image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30.166667
95-th percentile0.5
Maximum1.5
Range1.5
Interquartile range (IQR)0.166667

Descriptive statistics

Standard deviation0.1732062886
Coefficient of variation (CV)1.455448872
Kurtosis3.499384508
Mean0.1190054092
Median Absolute Deviation (MAD)0
Skewness1.786846819
Sum638.345015
Variance0.03000041842
MonotonicityNot monotonic
2022-03-06T14:10:56.595030image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=46)
ValueCountFrequency (%)
02801
50.7%
0.083333664
 
12.0%
0.166667466
 
8.4%
0.25360
 
6.5%
0.333333258
 
4.7%
0.416667155
 
2.8%
0.5105
 
1.9%
0.58333375
 
1.4%
0.66666756
 
1.0%
0.09090949
 
0.9%
Other values (36)375
 
6.8%
(Missing)166
 
3.0%
ValueCountFrequency (%)
02801
50.7%
0.083333664
 
12.0%
0.09090949
 
0.9%
0.128
 
0.5%
0.11111118
 
0.3%
0.12533
 
0.6%
0.14285733
 
0.6%
0.166667466
 
8.4%
0.18181827
 
0.5%
0.215
 
0.3%
ValueCountFrequency (%)
1.51
 
< 0.1%
1.1666671
 
< 0.1%
14
0.1%
0.9166672
 
< 0.1%
0.91
 
< 0.1%
0.8751
 
< 0.1%
0.8571434
0.1%
0.8333338
0.1%
0.83
 
0.1%
0.7777781
 
< 0.1%

CREDIT_LIMIT
Real number (ℝ≥0)

HIGH CORRELATION

Distinct134
Distinct (%)2.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3588.095256
Minimum50
Maximum12500
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size43.3 KiB
2022-03-06T14:10:56.741827image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum50
5-th percentile1000
Q11500
median2900
Q35000
95-th percentile9000
Maximum12500
Range12450
Interquartile range (IQR)3500

Descriptive statistics

Standard deviation2640.396238
Coefficient of variation (CV)0.7358768509
Kurtosis0.5970263702
Mean3588.095256
Median Absolute Deviation (MAD)1500
Skewness1.145162447
Sum19842166.77
Variance6971692.293
MonotonicityNot monotonic
2022-03-06T14:10:56.867344image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3000563
 
10.2%
1500542
 
9.8%
1200457
 
8.3%
1000454
 
8.2%
2500426
 
7.7%
4000317
 
5.7%
6000281
 
5.1%
2000280
 
5.1%
5000225
 
4.1%
7000147
 
2.7%
Other values (124)1838
33.2%
ValueCountFrequency (%)
501
 
< 0.1%
1504
 
0.1%
2003
 
0.1%
30012
 
0.2%
4002
 
< 0.1%
4504
 
0.1%
50087
1.6%
60015
 
0.3%
70017
 
0.3%
7503
 
0.1%
ValueCountFrequency (%)
1250012
 
0.2%
1200031
0.6%
1150025
 
0.5%
1100025
 
0.5%
107501
 
< 0.1%
1050041
0.7%
104001
 
< 0.1%
1000069
1.2%
99501
 
< 0.1%
97001
 
< 0.1%

PAYMENTS
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct5530
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1107.989817
Minimum0.056466
Maximum9933.62261
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size43.3 KiB
2022-03-06T14:10:56.997184image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0.056466
5-th percentile124.9274707
Q1345.4311015
median671.0016995
Q31354.931507
95-th percentile3710.658747
Maximum9933.62261
Range9933.566144
Interquartile range (IQR)1009.500406

Descriptive statistics

Standard deviation1270.892564
Coefficient of variation (CV)1.147025491
Kurtosis9.951139009
Mean1107.989817
Median Absolute Deviation (MAD)399.8415645
Skewness2.78151989
Sum6127183.69
Variance1615167.91
MonotonicityNot monotonic
2022-03-06T14:10:57.141604image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
74.7501721
 
< 0.1%
567.1496831
 
< 0.1%
660.5660741
 
< 0.1%
266.4723861
 
< 0.1%
1230.1802711
 
< 0.1%
474.4697411
 
< 0.1%
1262.9386061
 
< 0.1%
728.192681
 
< 0.1%
654.18431
 
< 0.1%
3189.9231771
 
< 0.1%
Other values (5520)5520
99.8%
ValueCountFrequency (%)
0.0564661
< 0.1%
3.5005051
< 0.1%
4.5235551
< 0.1%
4.8415431
< 0.1%
9.5333131
< 0.1%
12.7731441
< 0.1%
16.3854211
< 0.1%
18.1255271
< 0.1%
18.2086041
< 0.1%
18.3368051
< 0.1%
ValueCountFrequency (%)
9933.622611
< 0.1%
9858.0554481
< 0.1%
9801.6373311
< 0.1%
9724.8711421
< 0.1%
9614.6975581
< 0.1%
9307.7190551
< 0.1%
9076.5611321
< 0.1%
8972.8672291
< 0.1%
8919.2282341
< 0.1%
8805.2804361
< 0.1%

MINIMUM_PAYMENTS
Categorical

HIGH CARDINALITY

Distinct5441
Distinct (%)98.4%
Missing0
Missing (%)0.0%
Memory size43.3 KiB
??
 
89
299.351881
 
2
119.325453
 
1
967.565584
 
1
175.245981
 
1
Other values (5436)
5436 

Length

Max length13
Median length10
Mean length9.779385172
Min length2

Characters and Unicode

Total characters54080
Distinct characters13
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5439 ?
Unique (%)98.4%

Sample

1st row56.999671
2nd row195.162256
3rd row270.413449
4th row194.534934
5th row1129.747227

Common Values

ValueCountFrequency (%)
??89
 
1.6%
299.3518812
 
< 0.1%
119.3254531
 
< 0.1%
967.5655841
 
< 0.1%
175.2459811
 
< 0.1%
1013.7804861
 
< 0.1%
373.8848081
 
< 0.1%
705.8101641
 
< 0.1%
926.0871481
 
< 0.1%
189.4591571
 
< 0.1%
Other values (5431)5431
98.2%

Length

2022-03-06T14:10:57.399482image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
89
 
1.6%
299.3518812
 
< 0.1%
432.8764041
 
< 0.1%
431.4436311
 
< 0.1%
967.5655841
 
< 0.1%
175.2459811
 
< 0.1%
1013.7804861
 
< 0.1%
373.8848081
 
< 0.1%
705.8101641
 
< 0.1%
926.0871481
 
< 0.1%
Other values (5431)5431
98.2%

Most occurring characters

ValueCountFrequency (%)
16665
12.3%
.5441
10.1%
25081
9.4%
44849
9.0%
34794
8.9%
54701
8.7%
74670
8.6%
64669
8.6%
84598
8.5%
94539
8.4%
Other values (3)4073
7.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number48305
89.3%
Other Punctuation5697
 
10.5%
Lowercase Letter78
 
0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
16665
13.8%
25081
10.5%
44849
10.0%
34794
9.9%
54701
9.7%
74670
9.7%
64669
9.7%
84598
9.5%
94539
9.4%
03739
7.7%
Other Punctuation
ValueCountFrequency (%)
.5441
95.5%
?256
 
4.5%
Lowercase Letter
ValueCountFrequency (%)
ñ78
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common54002
99.9%
Latin78
 
0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
16665
12.3%
.5441
10.1%
25081
9.4%
44849
9.0%
34794
8.9%
54701
8.7%
74670
8.6%
64669
8.6%
84598
8.5%
94539
8.4%
Other values (2)3995
7.4%
Latin
ValueCountFrequency (%)
ñ78
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII54002
99.9%
Latin 1 Sup78
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
16665
12.3%
.5441
10.1%
25081
9.4%
44849
9.0%
34794
8.9%
54701
8.7%
74670
8.6%
64669
8.6%
84598
8.5%
94539
8.4%
Other values (2)3995
7.4%
Latin 1 Sup
ValueCountFrequency (%)
ñ78
100.0%

TENURE
Categorical

MISSING

Distinct19
Distinct (%)0.4%
Missing163
Missing (%)2.9%
Memory size43.3 KiB
12
4226 
11
 
224
10
 
149
6
 
135
7
 
125
Other values (14)
508 

Length

Max length4
Median length2
Mean length1.958263462
Min length1

Characters and Unicode

Total characters10510
Distinct characters10
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row8
2nd row12
3rd row-12
4th row12
5th row12

Common Values

ValueCountFrequency (%)
124226
76.4%
11224
 
4.1%
10149
 
2.7%
6135
 
2.4%
7125
 
2.3%
-12124
 
2.2%
8119
 
2.2%
9108
 
2.0%
??69
 
1.2%
12?ñ56
 
1.0%
Other values (9)32
 
0.6%
(Missing)163
 
2.9%

Length

2022-03-06T14:10:57.632664image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
124350
81.1%
11232
 
4.3%
10155
 
2.9%
6135
 
2.5%
7131
 
2.4%
8121
 
2.3%
9110
 
2.0%
69
 
1.3%
12?ñ56
 
1.0%
11?ñ3
 
0.1%
Other values (3)5
 
0.1%

Most occurring characters

ValueCountFrequency (%)
15033
47.9%
24406
41.9%
?202
 
1.9%
0157
 
1.5%
-148
 
1.4%
6135
 
1.3%
7131
 
1.2%
8123
 
1.2%
9111
 
1.1%
ñ64
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number10096
96.1%
Other Punctuation202
 
1.9%
Dash Punctuation148
 
1.4%
Lowercase Letter64
 
0.6%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
15033
49.9%
24406
43.6%
0157
 
1.6%
6135
 
1.3%
7131
 
1.3%
8123
 
1.2%
9111
 
1.1%
Dash Punctuation
ValueCountFrequency (%)
-148
100.0%
Other Punctuation
ValueCountFrequency (%)
?202
100.0%
Lowercase Letter
ValueCountFrequency (%)
ñ64
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common10446
99.4%
Latin64
 
0.6%

Most frequent character per script

Common
ValueCountFrequency (%)
15033
48.2%
24406
42.2%
?202
 
1.9%
0157
 
1.5%
-148
 
1.4%
6135
 
1.3%
7131
 
1.3%
8123
 
1.2%
9111
 
1.1%
Latin
ValueCountFrequency (%)
ñ64
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII10446
99.4%
Latin 1 Sup64
 
0.6%

Most frequent character per block

ASCII
ValueCountFrequency (%)
15033
48.2%
24406
42.2%
?202
 
1.9%
0157
 
1.5%
-148
 
1.4%
6135
 
1.3%
7131
 
1.3%
8123
 
1.2%
9111
 
1.1%
Latin 1 Sup
ValueCountFrequency (%)
ñ64
100.0%

Interactions

2022-03-06T14:10:44.426612image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:44.550147image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:44.648959image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:44.738275image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:44.830016image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:44.916349image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:45.007545image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:45.198670image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:45.297802image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:45.395100image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:45.483589image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:45.576050image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:45.671166image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:45.765525image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:45.853642image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:45.937505image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:46.049093image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:46.143596image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:46.241128image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:46.336071image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:46.427158image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:46.525129image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:46.622731image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:46.709989image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:46.795987image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:46.890287image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:46.982013image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:47.078880image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:47.176870image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:47.272665image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:47.368468image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:47.476304image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:47.572418image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:47.677742image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:47.805878image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:47.923496image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:48.141125image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:48.249393image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:48.362496image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:48.463447image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:48.564710image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:48.682486image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:48.789180image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:48.892712image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:48.991036image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:49.088399image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:49.190986image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:49.284255image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:49.374991image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:49.483759image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:49.586869image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:49.683497image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:49.791648image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:49.912767image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:50.024375image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:50.139551image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:50.240214image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:50.342475image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:50.450092image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:50.553483image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:50.670090image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:50.783984image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:50.888903image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:51.007653image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:51.117721image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:51.214874image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:51.317596image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:51.418626image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:51.509804image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:51.602527image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:51.697887image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:51.890625image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:51.987715image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:52.079773image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:52.168416image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:52.256224image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:52.352854image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:52.441176image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:52.532113image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:52.624644image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-03-06T14:10:52.725305image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2022-03-06T14:10:57.725524image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-03-06T14:10:57.895357image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-03-06T14:10:58.103474image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-03-06T14:10:58.296905image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2022-03-06T14:10:58.435187image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2022-03-06T14:10:52.953490image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-03-06T14:10:53.259706image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-03-06T14:10:53.478525image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-03-06T14:10:53.595345image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

CUST_IDGENDERBALANCEPURCHASESBALANCE_FREQUENCYCASH_ADVANCECASH_ADVANCE_TRXPURCHASES_FREQUENCYPURCHASES_TRXONEOFF_PURCHASES_FREQUENCYCASH_ADVANCE_FREQUENCYCREDIT_LIMITPAYMENTSMINIMUM_PAYMENTSTENURE
0C12529F107.944741118.160.875000472.8182861.00.12500020.1250.1250002500.0192.78145556.9996718
1C14138NaN241.0329790.001.000000642.8625051.00.0000000NaN0.0833331500.0915.454305195.16225612
2C15409NaN894.3578571164.001.0000000.00.01.00000012NaN0.0000002000.0907.603723270.413449-12
3C18141F-188.132508515.881.0000000.0NaN0.83333314NaN0.0000002700.0601.729266194.53493412
4C15879NaN3881.67958215.921.0000002183.7824569.00.0833331NaN0.3333335500.01032.1836321129.74722712
5C17660NaN1087.7846980.001.0000001562.7039532.00.00000000.0000.1666671500.03093.888643298.01196512
6C10916NaN1081.065726554.851.000000952.4249068.00.500000200.2500.1666672100.01898.828120382.71675112
7C15128NaN100.2083110.000.909091182.1439661.00.0000000NaN0.0909093000.0175.911508145.24418111
8C10109NaN862.0723800.001.000000920.3098051.00.00000000.0000.0833334000.02236.890255214.82815812
9C17983NaN1757.4399330.000.8333332408.0076016.00.00000000.0000.1666672500.0175.115831450.6167316

Last rows

CUST_IDGENDERBALANCEPURCHASESBALANCE_FREQUENCYCASH_ADVANCECASH_ADVANCE_TRXPURCHASES_FREQUENCYPURCHASES_TRXONEOFF_PURCHASES_FREQUENCYCASH_ADVANCE_FREQUENCYCREDIT_LIMITPAYMENTSMINIMUM_PAYMENTSTENURE
5520C16104NaN2525.6833440.001.000000285.1932045.00.0000000NaN0.1666677000.01483.384610702.05249112
5521C19019NaN634.5143540.000.9090911682.13742112.00.00000000.0000000.6363641500.02162.277429257.08164811
5522C18355NaN930.656420300.051.0000000.00.00.7500009NaN0.0000001200.0513.064156330.42281512
5523C18766NaN21.168201236.401.0000000.00.01.000000241.000000NaN2500.0217.008342178.16932112
5524C16616NaN846.0910112599.201.0000000.00.00.916667190.3333330.0000003000.01900.699307195.51606612
5525C10075NaN656.0130100.001000.0000001474.3499013.00.00000000.0000000.1250007000.0910.457985140.9831938
5526C17321NaN15.232505384.000.2727270.00.01.00000012?ñNaN0.0000001500.0568.98266454.44941612
5527C12909NaN1023.1247911537.931.000000247.041971.00.750000250.5833330.0833339000.01070.149971235.241959-12
5528C15615F957.010021604.801.000000901.7547093.01.00000012NaN0.0833331000.0811.457190926.08714812
5529C12391NaN2664.700424715.511.000000494.5736621.0750.000000110.0833330.0833333500.0918.003032792.90289412